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Abstract 



A feedforward multilayered neural network has been trained to "recognize" true 
VO's in the presence of a large combinatoric background using simulated data for 
2 GeV/nucleon Ni + Cu interactions. The resulting neural network filter has been 
applied to actual data from the EOS TPC experiment. An enhancement of signal 
to background over more traditional selection mechanisms has been observed. 



1 Introduction 



A high statistics sample of A's and K°'s produced in 2 GeV/nucleon Ni + Cu 
collisions has been obtained with the EOS Time Projection Chamber [1]. These 
neutral strange particles, or VO's, are reconstructed through their charged 
particle decays: A — > p + ir~ and K° — > ir + + 7r~ . The acceptance plus 
efficiency for detecting true VO's with the EOS TPC is very good; however, 
the sample is contaminated by a large combinatoric background of false A's 
and Kg's (e.g. ~ 40, 000 p7r~ pairs for every true A). 

Traditionally, the signal is extracted from the background by cutting on cer- 
tain parameters — such as distance of decay from the main vertex — whose 
distributions are different for signal and background. Inevitably, such cuts 
eliminate a significant fraction of the signal as well and one is confronted with 
the task of how to optimize the cuts. The optimization problem is a natural 
candidate for neural network techniques. A feedforward, multilayered neural 
network filter has been devised which, when applied to the EOS data, results 
in a cleaner, higher statistics sample of A's and K°'s. 
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Fig. 1. Schematic diagram illustrating A decay and reconstruction. 
2 VO Reconstruction 

A schematic diagram illustrating A reconstruction and the set of parameters 
used to separate the signal from the background is shown in Fig. 1. VO re- 
construction begins after all TPC tracks in an event have been found and the 
overall event vertex has been determined. In the case of A's, each pair of pn~ 
tracks is looped over and their point of closest approach is calculated. Pairs 
whose trajectories approximately intersect at a point other than the main ver- 
tex are fit with a VO hypothesis from which the invariant mass, momentum, 
and point of decay (X A ) are extracted. 

The A momentum vector is projected back to the target to obtain d, the 

— * 

distance between X\ and the overall event vertex, and b, the impact parameter 
or distance of closest approach between the event vertex and the A trajectory. 
Likewise, the proton and 7r~ tracks are projected back to the target to obtain 
dca p and dca n which are the distances of closest approach of the daughter 
particle trajectories to the main vertex. Another useful variable, closely related 
to the two dca's, is the distance between the p and n~ trajectories at the target 
plane, d pn ; while the distance between the p and ix~ trajectories at X\ is called 
Ar. In an ideal detector b and Ar would be exactly zero for true A's and Kg's. 
In any real detector, of course, these quantities will take on finite values. The 
estimated impact parameter resolution for the EOS TPC is a h ~ 3 mm. 

Seven parameters are cut on to separate true A's and K°'s from the back- 
ground: d, b, Ar, d pn , dca p , dca^, and the x 2 / u °f the combined fit to the VO 
hypothesis. The traditional method is to simply define a seven dimensional 
cut such as: d > 4 cm AND b < 3 mm AND d pn > 2 cm AND etc. The 
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Fig. 2. A invariant mass spectrum from seven parameter cuts. 

problem then becomes what the precise values of the cuts should be. For ex- 
ample, nearly all of the combinatoric background can be eliminated by simply 
requiring that d be very large. Since the true VO's follow an exponential decay 
law, however, such a cut would throw out most of the true signal as well. 

For the EOS data, an attempt to optimize the cuts has been made through trial 
and error using the invariant mass distribution as a guide. The invariant mass 
distribution resulting from the "best" cuts for A's is shown in Fig. 2. From 
Monte Carlo simulations it is estimated that over 60% of the reconstructed 
true A's are lost in making the cuts necessary to obtain the background level 
in this plot. 

Clearly, searching a seven dimensional parameter space by trial and error in an 
effort to optimize the cuts can be a very tedious process. In addition, the high 
degree of correlation among some of the parameters makes it unlikely that the 
optimum cut would be obtained by cutting perpendicular to each of the seven 
axes as above. An uncorrected set of paremeters could be formed from linear 
combinations of the original seven parameters by using a principal component 
analysis. However, one would still be left with the task of optimizing the 
cuts in the new parameter space. Moreover, any boundary surface chosen to 
separate the true VO's from the background would still be restricted to a seven 
dimensional polyhedron. An alternative method of cutting, which may allow 
for more complicated shapes in the seven dimensional space, is provided by 
neural network techniques. 
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Fig. 3. VO neural network topology. 
3 Neural Network Approach 



A general feedforward multilayered network consists of a set of input neurons, 
one or more layers of hidden neurons, a set of output neurons, and synapses 
connecting each layer to the subsequent layer [2]. A particular network topol- 
ogy for the appliction at hand is shown in Fig. 3 where the inputs, a iy are the 
seven VO parameters. There are two hidden layers, bj and Ck, and one output, 
o. The network is fully connected in the sense that there is a synapse connect- 
ing each a neuron to each b neuron, each b neuron to each c neuron, and each 
c neuron to the output. Given a set of inputs, the rules for calculating o are: 



: tanh 



(i) 



c fc = tanh ^2 w%bj - Q k J , 

° = 2 W k° C k , 



(2) 
(3) 



where the iu's are synaptic weights and the G's are thresholds. 

For the current application one would like o to take on one value for true 
VO's and a different value for false VO's, e.g. +1 for true and -1 for false. The 
problem then becomes one of finding a set of weights and thresholds that give 
the desired outputs. In general, this is accomplished by starting with random 
initial guesses for the iu's and 6's and "teaching" the network with a training 
set and a backpropagation algorithm. 
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A set of ~ 2.5 x 10 5 2 GeV/nucleon Ni + Cu events generated with the ARC 
cascade code [3] has been used to train and test the network of Fig. 3. Only 
those events which had some strangeness content in the final state were used 
in the training stage. The strange events were run through a detailed GEANT 
simulation of the TPC and passed through the same analysis chain as was the 
real data. Loose cuts on the seven VO parameters were applied to the output in 
order to weed out easy background. The resulting training set was composed of 
3757 true VO's and 41,600 combinatoric VO's. The VO's were labeled as being 
either true or false based on information stored from GEANT and were then 
passed one at a time through the neural net. Separate, though topologically 
identical, networks were used for A's and K°'s. 

As each VO is passed through its appropriate network the weights and thresh- 
olds are adjusted so that the actual output approaches the desired output. 
This is done by minimizing the error function: 



E = \{t-of 

= \(t-Y,<c k y, (4) 

k 

where t — +1 for true VO's and t — — 1 for background or fake VO's. A simple 
gradient descent algorithm: 



Ato.f = —ri-^—j; , (5) 

Ae, = -^, (6) 

is used to adjust the u>'s and 6's after o is calculated for each VO. In the 
present application r\ was chosen to be 0.05 and all of the thresholds were held 
fixed at zero. 

In principle, the training process should continue until the weights cease to 
change. In practice the same set of events was passed repeatedly through the 
network in alternating training and testing cycles. After each training cycle the 
events were filtered back through the network and a histogram of the resulting 
outputs (similar to Fig. 4) was visually inspected to judge convergence. The 
cpu time per training cycle was ~12 minutes on a 55 MHz HyperSparc. After 
10 cycles the networks were judged to have converged. 
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Fig. 4. Neural network output for (a) true and (b) false Monte Carlo A's. 
4 Results 

The performance of the networks was first tested on a subset of ~ 1.73 x 10 5 
ARC events which had not been preselected for strangeness. Although all of 
the events containing VO's in this subset were also members of the training set, 
they had been rerun through GEANT with different random number seeds. 
The same loose cuts on the seven parameters were applied to the GEANT 
output as in the training phase resulting in 2601 true and ~ 1.75 x 10 5 false 
VO's — roughly the same 1:7 ratio as is observed in the EOS data after loose 
cuts. The VO's were passed through the neural network filters and the resulting 
distributions of ouputs for A's are shown in Fig. 4. 

Qualitatively, one sees that the A network performs as desired: the distribution 
for the true's has a sharp peak at +1 while the combinatoric distribution is 
peaked at -1. Quantitatively, one can define a purity: 

true 

Purity = - , 

true + false 

and a yield: 

detected true 

Yield = . 

actual true 

The purity and yield factors for the traditional method of cutting and for cuts 
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Table 1: Purity and yield factors for Monte Carlo A's. 



on the value of the neural network output are listed in Table 1. All yields reflect 
a common ~60% loss factor arising from geometrical acceptance and tracking 
efficiency. From the table one sees that, for the Monte Carlo events, the neural 
network method gives a significantly higher yield than the traditional method 
at the same level of purity. Alternatively, higher purity levels can be obtained 
without loss of yield. Similar results are obtained with the K° network. 

The ultimate test of a neural network filter is its performance on actual data. 
Although yield and purity factors cannot be calculated, the overall perfor- 
mance can be judged by comparing the invariant mass distributions which 
results from cutting on o to those which result from the traditional method of 
cutting. The distribution for EOS A's obtained by requiring o > 0.95 is shown 
in Fig. 5 on the same scale as the distribution of Fig. 2. A higher peak and 
lower background are clearly evident in the neural net filtered distribution. 
When invariant mass cuts are applied (1112 MeV/c 2 < M A < 1120 MeV/c 2 ), 
the neural net method gives 1797 A candidates as opposed to 1362 for the 
traditional method. 



5 Conclusions 



For the EOS TPC data the neural network approach results in significant 
enhancements in both the yields and purities of A's and K^'s compared to 
the straightforward method of cutting in seven dimensional parameter space. 
The A candidates in the peaks of Figs. 2 and 5 were projected onto the seven 
parameter axes and a comparison of the resulting distributions was made. In 
general, the edges of the neural network filtered events are less sharp; lending 
support to the intuitive notion that the neural filter finds a smoother hyper- 
surface in the parameter space. 
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Fig. 5. A invariant mass spectrum from neural network cut. 

When working with neural network filters it is obviously important to insure 
that the training set matches the data to be filtered as closely as possible. 
This was observed in the present study when the neural network was ini- 
tially trained with a set of ARC events which did not include the coalescence 
of protons and neutrons to form deuterons. The result were VO neural net- 
works which performed only marginally better than the seven parameter cuts 
method. 

The neural network topology of Fig. 3 was found to work so well that other 
topologies were not investigated. It is possible that alternative network archi- 
tectures could give even better results. One of the disadvantages of the neural 
network approach is that there exists no a priori prescription for evaluating 
various topologies — one must simply proceed through trial and error. 

The cpu time spent in training the networks was not significant; however, the 
time spent in generating the training set was quite large: ~5 cpu minutes/event 
x ~17,000 strange ARC events. Since the GEANT simulations were being 
done anyway in order to study acceptance and efficiency issues the net cpu 
overhead on data processing due to neural network training can be considered 
to be negligible. 
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